Given the current housing crisis and housing shortage in the UK, I became interested in how different regions of England particularly are performing with their supply of new homes. I had a look into figures on gov.uk and found data on net additional dwellings completed, defined as “the absolute change in stock between one year and the next, including losses and gains from new builds, conversions, changes of use (for example a residential house to an office) and demolitions”.
Given that bigger regions would have more housing, I found data on the population of regions by year from the Office for National Statistics to created a per 1000 people figure for a given region at a given year. This would allow truer comparisons of regions.
#readODS package needed to read the Net Additional Dwellings .ods dataset
if(!require(readODS)) install.packages("readODS")
## Loading required package: readODS
library(readODS)
##readxl package needed to read the Regional Population .xlsx dataset
if(!require(readxl)) install.packages("readxl")
## Loading required package: readxl
library(readxl)
#Library required for data wrangling
if(!require(tidyverse)) install.packages("tidyverse")
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(dplyr)
##ggplot2 allows for plotting with ggplot
if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)
##plotly allows interactivity from ggplot plots. I like the use of plot zooming and custom hover text which is why plotly was used
if(!require(plotly)) install.packages("plotly")
## Loading required package: plotly
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(plotly)
#Adding data to a data frame
#Ensure data names match what I've named them below and are stored in a "data" file if you are downloading from sources yourself
data_dwell <- read_ods("data/raw_netadditionaldwellings.ods", sheet = 5)
## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
data_pop <- read_xlsx("data/raw_population.xlsx", sheet = 11)
## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
#See new imported data sets are what we need
head(data_dwell)
## # A tibble: 6 × 12
## Table 118 Annual net …¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 This worksheet contains… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 Back to contents <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 Components of net housi… Year Nort… Nort… York… East… West… East… Lond… Sout…
## 4 Net additions [note 1] 2000… 2890 10720 10800 14830 13790 17780 15710 25680
## 5 Net additions [note 1] 2001… 4489 13543 12752 16188 13052 19154 19688 26219
## 6 Net additions [note 1] 2002… 5343 18120 13452 16938 13742 21554 21648 27269
## # ℹ abbreviated name:
## # ¹`Table 118 Annual net additional dwellings and components, England and the regions, 2000-01 to 2023-24, unrounded`
## # ℹ 2 more variables: ...11 <chr>, ...12 <chr>
head(data_pop)
## # A tibble: 6 × 16
## MYE4: Population estim…¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 This worksheet contains… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 To turn off freeze pane… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 Please choose from the … <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 This met my needs, plea… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 I need something slight… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 This is not what I need… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## # ℹ abbreviated name:
## # ¹`MYE4: Population estimates: Summary for England and Wales, mid-2011 to mid-2023`
## # ℹ 6 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
## # ...15 <chr>, ...16 <chr>
#Will be merging these data sets so we get a figure of the number of new housing units relative to that region's population for a given year, by 1000 people to allow regional comparisons
#Data needs to match by Year and Region, therefore need to ensure relevant wrangling is completed...
# Wrangle: Net Additional Dwellings data
#We are looking at regions of England, from 2011 to 2023 (as no pre-2011 population data was available unfortunately)
head(data_dwell)
## # A tibble: 6 × 12
## Table 118 Annual net …¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 This worksheet contains… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 Back to contents <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 Components of net housi… Year Nort… Nort… York… East… West… East… Lond… Sout…
## 4 Net additions [note 1] 2000… 2890 10720 10800 14830 13790 17780 15710 25680
## 5 Net additions [note 1] 2001… 4489 13543 12752 16188 13052 19154 19688 26219
## 6 Net additions [note 1] 2002… 5343 18120 13452 16938 13742 21554 21648 27269
## # ℹ abbreviated name:
## # ¹`Table 118 Annual net additional dwellings and components, England and the regions, 2000-01 to 2023-24, unrounded`
## # ℹ 2 more variables: ...11 <chr>, ...12 <chr>
#Looks like we need to remove some blank columns and rows, remove the England column, rename Year column data to just numbers, remove pre-2011 data, and convert the regions to long
#remove 1st 2 rows as they are irrelevant
data_dwell <- data_dwell[-1:-2, ]
#make first row the headings for columns, then delete first row
colnames(data_dwell) <- data_dwell[1, ]
data_dwell <- data_dwell[-1, ]
#Check if that looks right
head(data_dwell)
## # A tibble: 6 × 12
## Components of net hou…¹ Year `North East` `North West` Yorkshire and The Hu…²
## <chr> <chr> <chr> <chr> <chr>
## 1 Net additions [note 1] 2000… 2890 10720 10800
## 2 Net additions [note 1] 2001… 4489 13543 12752
## 3 Net additions [note 1] 2002… 5343 18120 13452
## 4 Net additions [note 1] 2003… 5231 21853 16252
## 5 Net additions [note 1] 2004… 6972 21411 15024
## 6 Net additions [note 1] 2005… 6927 23832 18684
## # ℹ abbreviated names: ¹`Components of net housing supply`,
## # ²`Yorkshire and The Humber`
## # ℹ 7 more variables: `East Midlands` <chr>, `West Midlands` <chr>,
## # `East of England` <chr>, London <chr>, `South East` <chr>,
## # `South West` <chr>, England <chr>
#Delete "England" column as we're just seeing regional differences
data_dwell <- data_dwell %>%
select(-"England")
#select only Net data
#Backticks `` used to select labels with spaces in them
data_dwell <- data_dwell %>% filter(
str_starts(`Components of net housing supply`, "Net additions")
)
#can now delete this "Components of net housing supply" column as have used filters
data_dwell <- data_dwell %>%
select(-`Components of net housing supply`)
#rename Year as first 4 digits: cannot convert to numeric otherwise
data_dwell$Year <- str_sub(data_dwell$Year, 1, 4)
#remove data pre-2011 to match the dataset with the population data we have
data_dwell <- data_dwell %>%
filter(Year >= 2011)
#make long
data_dwell <- pivot_longer(data_dwell,
cols = -Year, #all columns except year as it's already long
names_to = "Region",
values_to = "Net_Additional_Dwellings")
#Quick check whether we have missing data
is.na(data_dwell)
## Year Region Net_Additional_Dwellings
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE
## [35,] FALSE FALSE FALSE
## [36,] FALSE FALSE FALSE
## [37,] FALSE FALSE FALSE
## [38,] FALSE FALSE FALSE
## [39,] FALSE FALSE FALSE
## [40,] FALSE FALSE FALSE
## [41,] FALSE FALSE FALSE
## [42,] FALSE FALSE FALSE
## [43,] FALSE FALSE FALSE
## [44,] FALSE FALSE FALSE
## [45,] FALSE FALSE FALSE
## [46,] FALSE FALSE FALSE
## [47,] FALSE FALSE FALSE
## [48,] FALSE FALSE FALSE
## [49,] FALSE FALSE FALSE
## [50,] FALSE FALSE FALSE
## [51,] FALSE FALSE FALSE
## [52,] FALSE FALSE FALSE
## [53,] FALSE FALSE FALSE
## [54,] FALSE FALSE FALSE
## [55,] FALSE FALSE FALSE
## [56,] FALSE FALSE FALSE
## [57,] FALSE FALSE FALSE
## [58,] FALSE FALSE FALSE
## [59,] FALSE FALSE FALSE
## [60,] FALSE FALSE FALSE
## [61,] FALSE FALSE FALSE
## [62,] FALSE FALSE FALSE
## [63,] FALSE FALSE FALSE
## [64,] FALSE FALSE FALSE
## [65,] FALSE FALSE FALSE
## [66,] FALSE FALSE FALSE
## [67,] FALSE FALSE FALSE
## [68,] FALSE FALSE FALSE
## [69,] FALSE FALSE FALSE
## [70,] FALSE FALSE FALSE
## [71,] FALSE FALSE FALSE
## [72,] FALSE FALSE FALSE
## [73,] FALSE FALSE FALSE
## [74,] FALSE FALSE FALSE
## [75,] FALSE FALSE FALSE
## [76,] FALSE FALSE FALSE
## [77,] FALSE FALSE FALSE
## [78,] FALSE FALSE FALSE
## [79,] FALSE FALSE FALSE
## [80,] FALSE FALSE FALSE
## [81,] FALSE FALSE FALSE
## [82,] FALSE FALSE FALSE
## [83,] FALSE FALSE FALSE
## [84,] FALSE FALSE FALSE
## [85,] FALSE FALSE FALSE
## [86,] FALSE FALSE FALSE
## [87,] FALSE FALSE FALSE
## [88,] FALSE FALSE FALSE
## [89,] FALSE FALSE FALSE
## [90,] FALSE FALSE FALSE
## [91,] FALSE FALSE FALSE
## [92,] FALSE FALSE FALSE
## [93,] FALSE FALSE FALSE
## [94,] FALSE FALSE FALSE
## [95,] FALSE FALSE FALSE
## [96,] FALSE FALSE FALSE
## [97,] FALSE FALSE FALSE
## [98,] FALSE FALSE FALSE
## [99,] FALSE FALSE FALSE
## [100,] FALSE FALSE FALSE
## [101,] FALSE FALSE FALSE
## [102,] FALSE FALSE FALSE
## [103,] FALSE FALSE FALSE
## [104,] FALSE FALSE FALSE
## [105,] FALSE FALSE FALSE
## [106,] FALSE FALSE FALSE
## [107,] FALSE FALSE FALSE
## [108,] FALSE FALSE FALSE
## [109,] FALSE FALSE FALSE
## [110,] FALSE FALSE FALSE
## [111,] FALSE FALSE FALSE
## [112,] FALSE FALSE FALSE
## [113,] FALSE FALSE FALSE
## [114,] FALSE FALSE FALSE
## [115,] FALSE FALSE FALSE
## [116,] FALSE FALSE FALSE
## [117,] FALSE FALSE FALSE
#All values come back False, meaning no missing values
#Convert to numeric to allow plotting
data_dwell$Year <- as.numeric(data_dwell$Year)
data_dwell$Net_Additional_Dwellings <- as.numeric(data_dwell$Net_Additional_Dwellings)
#How does it look?
print(data_dwell)
## # A tibble: 117 × 3
## Year Region Net_Additional_Dwellings
## <dbl> <chr> <dbl>
## 1 2011 North East 3939
## 2 2011 North West 10612
## 3 2011 Yorkshire and The Humber 12066
## 4 2011 East Midlands 12426
## 5 2011 West Midlands 10206
## 6 2011 East of England 18460
## 7 2011 London 29672
## 8 2011 South East 24835
## 9 2011 South West 18570
## 10 2012 North East 3589
## # ℹ 107 more rows
#Looks good!
My initial thoughts: I was quite surprised to see how few homes were being built - I’d expected more of a noticeable increase as years and populations increased. There is the obvious dip in 2020 and lag to 2021 due to halting construction activity, and it seems we’re still in a sort of housing slog.
Although my project offers an interesting insight into new housing supply differences across English regions, it of course overlooks the contextual and political factors involved in housing supply. Economic conditions (e.g., unemployment, income levels), government policies (e.g., housing subsidies, planning permissions/ regulations) and details regarding types of housing or population density are not considered. Future projects may wish to investigate adjusting to these factors for a more nuanced insight.
My initial idea for this project was to use data on affordable units of housing available regionally and find proportions of net housing that are affordable, to see where in England there are the most affordable options (both ownership and rent) for housing. This fell short when I realised the affordable housing data is only available as a gross figure (i.e. not taking into account any losses of affordable housing), therefore any data presented would be misleading. This would be a really interesting future idea given the net data is published. There is some “Official Statistics in Development” data only on affordable housing for rent, if interested (see references).
All data was extracted from the Office for National Statistics and from gov.uk.
Net Additional Dwellings data: Table 118: annual net
additional dwellings and components, England and the regions (ODS, 54KB)
-
https://www.gov.uk/government/statistical-data-sets/live-tables-on-net-supply-of-housing#live-tables
(Updated: 28 November 2024)
Involvement from: Ministry of Housing, Communities and Local Government, Ministry of Housing, Communities & Local Government (2018 to 2021), Department for Levelling Up, Housing and Communities
Population estimates data: Mid-2023: 2023 local authority
boundaries edition of this dataset edition of this dataset (xlsx,
813.1KB) -
https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/estimatesofthepopulationforenglandandwales
(Released: 15 July 2024)
Unused data: Links for anyone interested in affordable data
sets mentioned:
https://www.gov.uk/government/statistical-data-sets/live-tables-on-affordable-housing-supply
https://assets.publishing.service.gov.uk/media/65c0b5dec43191000d1a451f/Net_Affordable_Housing_for_Rent.ods/preview